The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
translated by 谷歌翻译
Responding with multi-modal content has been recognized as an essential capability for an intelligent conversational agent. In this paper, we introduce the MMDialog dataset to better facilitate multi-modal conversation. MMDialog is composed of a curated set of 1.08 million real-world dialogues with 1.53 million unique images across 4,184 topics. MMDialog has two main and unique advantages. First, it is the largest multi-modal conversation dataset by the number of dialogues by 8x. Second, it contains massive topics to generalize the open-domain. To build engaging dialogue system with this dataset, we propose and normalize two response producing tasks based on retrieval and generative scenarios. In addition, we build two baselines for above tasks with state-of-the-art techniques and report their experimental performance. We also propose a novel evaluation metric MM-Relevance to measure the multi-modal responses. Our dataset and scripts are available in https://github.com/victorsungo/MMDialog.
translated by 谷歌翻译
Partial label learning (PLL) is a typical weakly supervised learning, where each sample is associated with a set of candidate labels. The basic assumption of PLL is that the ground-truth label must reside in the candidate set. However, this assumption may not be satisfied due to the unprofessional judgment of the annotators, thus limiting the practical application of PLL. In this paper, we relax this assumption and focus on a more general problem, noisy PLL, where the ground-truth label may not exist in the candidate set. To address this challenging problem, we further propose a novel framework called "Automatic Refinement Network (ARNet)". Our method consists of multiple rounds. In each round, we purify the noisy samples through two key modules, i.e., noisy sample detection and label correction. To guarantee the performance of these modules, we start with warm-up training and automatically select the appropriate correction epoch. Meanwhile, we exploit data augmentation to further reduce prediction errors in ARNet. Through theoretical analysis, we prove that our method is able to reduce the noise level of the dataset and eventually approximate the Bayes optimal classifier. To verify the effectiveness of ARNet, we conduct experiments on multiple benchmark datasets. Experimental results demonstrate that our ARNet is superior to existing state-of-the-art approaches in noisy PLL. Our code will be made public soon.
translated by 谷歌翻译
CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking and contains three novel designs. Specifically, SOLAR first generates a pre-determined shuffled index list and accordingly optimizes the global access order and the buffer eviction scheme to maximize the data reuse and the buffer hit rate. It then proposes a tradeoff between lightweight computational imbalance and heavyweight loading workload imbalance to speed up the overall training. It finally optimizes its data access pattern with HDF5 to achieve a better parallel I/O throughput. Our evaluation with three scientific surrogates and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch Data Loader and 3.52X speedup over state-of-the-art data loaders.
translated by 谷歌翻译
目前,跨景元的高光谱图像(HSI)分类引起了人们的注意。当需要实时处理TD且不能重复使用训练时,必须仅在源域(SD)上训练模型(SD)并将模型直接传输到目标域(TD)。基于域概括的思想,开发了单源域扩展网络(SDENET),以确保域扩展的可靠性和有效性。该方法使用生成的对抗学习在SD中训练和TD测试。包括语义编码器和MORPH编码器在内的发电机旨在基于编码器随机化架构生成扩展域(ED),其中空间和频谱随机化专门用于生成可变的空间和光谱信息,并隐含形态知识。作为域扩展过程中的域不变信息。此外,受监督的对比学习被采用在歧视者中,以学习阶级领域不变的表示,该表示驱动了SD和ED的阶级样本。同时,对抗性训练旨在优化发电机以驱动SD和ED的阶级样品进行分离。与最先进的技术相比,在两个公共HSI数据集和另一个多光谱图像(MSI)数据集上进行了广泛的实验,证明了该方法的优越性。
translated by 谷歌翻译
我们介绍了一项对自然语言(NL)推理的人类通知,开放域和逻辑上复杂且多样的数据集,配备了一阶逻辑(fol)注释。对开本由1,435个示例(独特的结论)组成,每个示例与487组前提之一搭配,这些场所作为规则,可用于演绎理由,以理解每个结论的有效性。前提和结论的逻辑正确性是通过其平行注释来确保的,这些注释会自动由我们的FOL推理引擎验证。除了主要的NL推理任务外,对开本中的NL-FOL对自动构成了使用FOL作为逻辑形式的新的NL-FOL翻译数据集。我们对广泛的实验系统地评估了对中型语言模型(BERT,ROBERTA)进行微调的FOL推理能力,并且在大型语言模型(GPT-NEOX,OPT,OPT,GPT-3,Codex)上促成了很少的射击。对于NL-FOL翻译,我们尝试使用GPT-3和Codex。我们的结果表明,公开可用的最强大的大语言模型之一(LLM),GPT-3 Davinci,仅比随机结果略好,而在一部分集的一部分中,该模型尤其不好,并且在预测该模型方面尤其不好。纠正虚假和未知结论的真实价值。我们的数据集和代码可在https://github.com/yale-lily/folio上找到。
translated by 谷歌翻译
我们开发了WOC,这是一个基于网络摄像头的3D虚拟在线聊天室,用于多人交互,该聊天介绍了用户的3D运动,并实时驱动其单独的3D虚拟化头像。与现有的基于可穿戴设备的解决方案相比,WOC使用单个相机提供方便和低成本的3D运动捕获。为了促进身临其境的聊天体验,WOC提供了高保真虚拟化的化身操纵,这也支持用户定义的字符。使用分布式数据流服务,系统为所有用户提供高度同步的运动和声音。部署在网站上,无需安装,用户可以在https://yanch.cloud上自由体验虚拟在线聊天。
translated by 谷歌翻译
Active域适应(ADA)查询所选目标样本的标签,以帮助将模型从相关的源域调整为目标域。由于其有希望的表现,标签成本最少,因此最近引起了人们越来越多的关注。然而,现有的ADA方法尚未完全利用查询数据的局部环境,这对ADA很重要,尤其是当域间隙较大时。在本文中,我们提出了一个局部环境感知的活动域适应性(LADA)的新框架,该框架由两个关键模块组成。本地上下文感知的活动选择(LAS)模块选择其类概率预测与邻居不一致的目标样本。局部上下文感知模型适应(LMA)模块完善了具有查询样本及其扩展的邻居的模型,并由上下文保留损失正规化。广泛的实验表明,与现有的主动选择策略相比,LAS选择了更多的信息样本。此外,配备了LMA,整个LADA方法的表现优于各种基准测试的最先进的ADA解决方案。代码可在https://github.com/tsun/lada上找到。
translated by 谷歌翻译
增强对未标记目标数据的模型预测置信度是无监督域适应(UDA)的重要目标。在本文中,我们探讨了关于倒数第二个线性分类层的输入特征的对抗性训练。我们表明,这种策略比以前的作品所使用的对对抗性图像或中间特征的对抗训练更有效,并且与提高预测置信度的目的更加相关。此外,通过在域适应中通常使用激活归一化以减少域间隙,我们得出了两个变体,并系统地分析了归一化对对抗性训练的影响。这在理论上和通过对实际适应任务的经验分析都进行了说明。在标准设置和无源DATA设置下,对流行的UDA基准测试进行了广泛的实验。结果证明了我们的方法可以在以前的艺术中取得最佳分数。
translated by 谷歌翻译